Index


Symbols

  • α (alpha)
    • Bonferroni, 153–154
    • definition of, 41
    • level, 206
    • relation to sample sizes, 365–366
    • setting, 43
  • * (asterisk), 155
  • β (beta), 41
  • λ (half‐life), 280–282
  • κ (kappa), 189–190
  • μg (micrograms), 280–282
  • Π (pi), 27–28
  • √ (radical sign), 21
  • Σ (sigma), 27–28
  • γ (skewness coefficient), 121

A

  • absolute values, 23
  • accuracy, 37, 38, 262–264
  • active group, 187
  • actuarial life tables, 307, 311–316, 320–321
  • addition, 18–19
  • additive, 296
  • administrative measurements, 63
  • adverse events, 70
  • agriculture, 1
  • Akaike’s Information Criterion (AIC), 259, 276–277, 342
  • alcohol intake, 94–98
  • alpha (α)
    • Bonferroni, 153–154
    • definition of, 41
    • level, 206
    • relation to sample sizes, 365–366
    • setting, 43
  • alternative hypothesis, 40, 43, 144, 150–151, 324
  • Alzheimer’s disease, 91, 146
  • amputation, 292–293
  • analysis of variance (ANOVA)
    • assessing, 152–157
    • introduction to, 11, 47–49
    • using, 143–145, 158
  • analytic dataset, 76
  • analytic research, 88–90
  • analytic study designs, 91
  • analytic suite, 57–58
  • analyzing data, 7, 9–10, 74–76
  • and rule, 31
  • animal research, 1
  • ANOVA (analysis of variance)
    • assessing, 152–157
    • introduction to, 11, 47–49
    • using, 143–145, 158
  • anticipated enrollment rate, 347
  • antilogarithm, 22, 118–119
  • anti‐synergy, 245–247
  • area under the ROC curve (AUC), 265, 280
  • arguments, 23
  • arithmetic mean, 115–116
  • arrays, 25–27
  • asbestos, 296–298
  • asymptomatic confidence limits, 134
  • attrition, 366–367
  • average values, 11, 39–40, 141–158

B

  • background information, 69
  • backward elimination approach, 295
  • bad fit line, 215–216
  • balanced groups, 154
  • bar charts, 113, 126
  • base‐2 logarithms, 22
  • base‐10 logarithms, 22
  • base‐e logarithms, 22
  • baseline survival function, 342–344
  • baseline values, 96
  • beta (β), 41
  • bimodal (two‐peaked) distribution, 114, 117
  • binary logarithms, 22
  • binary variables, 173–174, 177, 236–238
  • binomial distribution, 36, 354–355
  • biology, 1
  • biopsy specimens, 188
  • biostatisticians, 60, 141, 161, 168–169
  • biostatistics, definition of, 1, 7
  • bivariate analysis, 11, 128, 145, 160
  • blinding, 66, 70, 171
  • blocked randomization, 67
  • blood pressure
    • of study participants, 116–117
    • variable name, 18
  • body mass index (BMI), 177
  • bolus, 280
  • Bonferroni adjustment, 153–156
  • box‐and‐whiskers charts, 127–128
  • Bradford Hill’s criteria of causality, 297–298

C

  • calculated values, 37
  • calculations, 8
  • cancer
    • as a categorical variable, 236–238
    • liver, 93–97
    • lung, 296–298
    • relation to weight, 11
    • remission, 318, 325
    • stages, 102
    • survival data, 208, 301–306, 337, 343
  • candidate covariates, 291
  • cannabidiol (CBD), 159–166
  • car accidents, 274–278
  • case report forms (CRFs), 74
  • case reports, 91
  • case series, 91
  • case studies, 91
  • case‐control study, 90, 93–96
  • CAT (computerized tomography) scans, 188
  • categorical data, 112–114
  • categorical variables, 236–238
  • causal inference, 11–12, 87, 90–95, 247
  • CBD (cannabidiol), 159–166
  • censoring, 302–306, 329, 335
  • census, 33–34
  • center, 115
  • centiles, 120
  • central tendency, 115
  • central‐limit theorem (CLT), 134
  • charts and charting. See also graphing
    • bar and pie, 113, 126
    • box‐and‐whiskers, 127–128
    • categorical data, 112–114
    • correlation coefficients, 202–203
    • hazard rates and survival probabilities, 312–315
    • multiple regression, 243–254
    • numerical data, 124–128
    • Poisson regression, 273–276
    • Receiver Operator Characteristics (ROC), 264–265
    • residuals, 222–223
    • scatter plots, 214, 219, 221, 238–240
    • software for, 57–58
    • s‐shaped data, 252–256
    • student t test, 45–47
  • CHD (coronary heart disease), 92–93
  • chemotherapy, 337
  • chi‐square distribution, 165–166, 358–359
  • chi‐square test
    • pros and cons, 167–169
    • sample size, 171–172
    • tables, 13
    • using, 11, 161–167, 174
  • chronic obstructive pulmonary disease (COPD), 302
  • CI (confidence interval)
  • CIR (cumulative incidence rate), 178–179
  • CL (confidence level), 39
  • CL (confidence limits), 130, 134
  • classification tables, 262–264
  • ClinCalc, 172
  • clinical center variable, 338
  • clinical trials, 9–10, 61–74
  • cloud‐based storage, 60
  • CLT (central‐limit theorem), 134
  • cluster sampling, 84
  • Cochrane Collaboration, 297
  • code, 241
  • code‐based methods, 60
  • coding categories, 105–107
  • coefficient of determination, 227
  • coefficient of variation (CV), 120
  • Cohen’s Kappa, 189
  • cohort study, 90, 93–96
  • collecting data
    • introduction to, 1–2, 9–10
    • manually and digitally, 74, 101–110
  • collinearity, 246–247, 266
  • commercial software, 54–58
  • common logarithms, 22
  • comparing averages, 142
  • complete separation, 267–268
  • complicated formulas, 24
  • Comprehensive R Archive Network (CRAN), 58
  • computer science, 18
  • computerized tomography (CAT) scans, 188
  • concordance, 342
  • confidence interval (CI)
  • confidence level (CL), 39
  • confidence limits (CL), 130, 134
  • confidentiality, 71
  • confounding
    • adjusting for, 145, 294–296
    • criteria for, 292
    • definition of, 66, 94
    • residual, 68
  • constants, 17
  • contingency tables, 173–178, 184, 187–188
  • control group, 98, 154, 187
  • convenience sampling, 84–85
  • COPD (chronic obstructive pulmonary disease), 302
  • coronary heart disease (CHD), 92–93
  • correlation, 40, 201–202, 220
  • correlation coefficient
    • analyzing, 203–207
    • description of, 202–203
    • example of, 10, 40
    • straight‐line regression, 227–228
    • table, 242
  • correlational studies, 91–93
  • COVID‐19, 183–184
  • Cox, David (biostatistician), 330
  • Cox proportional hazards regression, 330
  • Cox/Snell R‐square, 259
  • CRAN (Comprehensive R Archive Network), 58
  • CRFs (case report forms), 74
  • crossover design, 65
  • cross‐sectional studies, 87–96, 176
  • cross‐tabulated data
    • analyzing, 160–167, 171–172
    • example of, 112–113, 166
    • introduction to, 11
    • relation to logistic regression, 250
    • tables, 173–174
  • cumulative incidence rate (CIR), 178–179
  • cumulative survival probability, 311–314
  • curved‐line relationships, 214
  • CV (coefficient of variation), 120

D

  • data. See also cross‐tabulated data
  • data close‐out, 76
  • data dictionary, 110
  • data safety monitoring board (DSMB), 73, 76
  • data safety monitoring committee (DSMC), 73
  • data snapshot, 76
  • date data, 108–110
  • date of last contact, 303
  • DBP (diastolic blood pressure), 116–117
  • deciliter (dL), 280
  • decision theory, 10, 39–40
  • degrees of freedom (df)
    • calculating, 147–148, 152–153
    • for chi‐square tests, 166–167, 358–359
  • dementia, 91, 146, 187–188
  • denominator, 41–42
  • dependent variable, 208, 213, 235–245
  • descriptive research, 88–90
  • descriptive study designs, 91
  • desired power, 347
  • desired α level, 347
  • determinants, 191
  • deviation, 119, 258–259
  • df (degrees of freedom)
    • calculating, 147–148
    • for chi‐square test, 166–167, 358–359
    • numerator and denominator, 152
  • diabetes, 135–136, 236–240. See also Type II diabetes
  • diagnostic procedures, 183–188
  • diastolic blood pressure (DBP), 116–117, 123–124
  • dichotomous variables, 173–174, 249, 269
  • difference, 147–148, 163
  • difference table, 163
  • disease, 191, 193–194
  • dispersion, 115, 119
  • distribution center, 115
  • distributions
    • bimodal (two‐peaked), 114–117
    • binomial, 36, 354–355
    • chi‐square, 165–166, 358–359
    • exponential, 356
    • Fisher F, 152, 359–360
    • frequency, 47–48
    • leptokurtic and platykurtic, 122
    • normal, 13, 36, 114, 353
    • probability, 35–37
    • sampling, 38
    • statistical, 13
    • student t, 357–358
    • weibull, 330, 356–357
  • District of Columbia, 34–35
  • division, 20
  • dL (deciliter), 280
  • dose‐response relationship, 298
  • double‐blinding, 66, 97
  • double‐precision numbers, 107
  • drug description, 70
  • drug development research, 280–282
  • DSMB (data safety monitoring board), 73, 76
  • DSMC (data safety monitoring committee), 73
  • Dupont, William D. (biostatistician), 60

E

  • ECG (electrocardiogram), 188
  • ecologic fallacy, 93
  • ecologic studies, 91–93
  • effect modification, 296–297
  • effect size
    • compared to power and sample size, 45–47
    • definitions of, 362–364
    • example of, 39
    • of importance, 158, 206, 361
  • efficacy objectives, 62–63
  • electrocardiogram (ECG), 188
  • elements, 24, 27–28
  • elimination constant rate, 282
  • engineering, 18
  • enzyme levels, 125–128
  • Epi Info, 59
  • epidemiological research, 1, 9–12
  • epidemiology, 191, 291–298
  • Epidemiology for Dummies (Mitra), 92
  • equal variance, 143
  • equations, 15, 24–25, 35–36
  • error, 78
  • estimate value, 242
  • estimation theory, 10
  • event status variable, 337
  • evidence levels, 91
  • Excel (Microsoft)
    • for data collection, 103, 105, 107–110
    • functions of, 57
    • for log‐rank tests, 319–320
    • for randomization, 67
    • for straight‐line regression, 217
    • for survival regression, 343
  • exclusion, 74
  • exclusion criteria, 64
  • expected count, 164
  • expected survival, 329, 331, 343–346
  • experimental research, 61, 88–90, 97–98
  • experiments, 1, 9–10, 91
  • expert opinion, 91
  • explicit constants, 17
  • exploratory analysis, 294
  • exploratory efficacy objective, 63
  • exploratory objectives, 62–63
  • exponential distribution, 356
  • exponential increase, 277
  • exponentiating, 21
  • exposure, 83, 175, 178, 193, 292
  • extrapolation, 108

F

  • F ratio, 152
  • F statistic, 228. 242
  • F value (value of F statistic), 155
  • factorials, 22
  • failure times, 356–357
  • fasting glucose values, 25–26, 153
  • fat intake, 92–93
  • FDA (Food and Drug Administration), 71
  • first quartile, 222
  • Fisher, Ronald Aylmer (biostatistician), 196
  • Fisher Exact test, 11, 13, 169–172
  • Fisher F distribution, 152, 359–360
  • Fisher z transformation, 204–205
  • fleas, 7
  • floating point numbers, 107
  • Food and Drug Administration (FDA), 71
  • formulas
    • building blocks of, 17
    • creating, 11–12
    • definition of, 15
    • introduction to, 1–2, 7–8
    • types of, 16, 24
  • forward stepwise approach, 294–295
  • fourfold tables, 11, 173–178, 184, 187–188
  • fractional numbers, 107
  • free software, 58–60
  • free‐text data, 103
  • frequency bar charts, 113
  • frequency distribution, 47–48
  • functions, 23

G

  • gamma‐ray radiation, 251–256, 260–263, 267, 337–338
  • Gaussian distribution, 353
  • generalized linear model (GLM), 272–278
  • genetics, 1
  • geometric mean (GM), 118–119
  • GLM (generalized linear model), 272–278
  • glucose values, 25–26, 153
  • gold standard test, 183–184
  • good fit line, 215–216
  • goodness of fit, 227–228, 258–259
  • G*Power
  • graphing. See also charts and charting
    • categorical data, 112–114
    • correlation coefficients, 202–203
    • hazard rates and survival probabilities, 312–315
    • multiple regression, 243–245
    • numerical data, 124–128
    • Poisson regression, 273–276
    • Receiver Operator Characteristics (ROC), 264–265
    • residuals, 222–223
    • software for, 57–58
    • s‐shaped data, 252–256
    • student t test, 45–47
  • GraphPad, 67
  • Greek letters, 17
  • GUI (guided user interface), 56, 58

H

  • h value, 344–346
  • half‐life (λ), 280–282
  • hazard rate
    • definition of, 305
    • from life tables, 311–315
    • relation to survival rate, 333
  • hazard ratios (HR), 334–335, 340, 364
  • health insurance, 112–114
  • healthcare, 9–10
  • highway accidents, 274–278
  • Hill, Bradford (epidemiologist)
    • Bradford Hills’ criteria of causality, 297–298
  • histogram, 34–35, 124–125
  • historical control, 142
  • H‐L test, 259
  • homogeneity of variances, 155
  • hormone concentration, 287–290
  • Hosmer‐Lemeshow Goodness of Fit test, 259
  • HR (hazard ratios), 334–335, 340
  • HTN (hypertension), 90, 94–98, 177–178
  • human health research, 88
  • human subjects protection certification, 73
  • hyperplane, 234
  • hypertension (HTN), 90, 94–98, 177–178
  • hypothesis, 40–47, 63, 94, 247. See also null hypothesis
  • hypothesis‐driven analysis, 294
  • hypothesized cause, 83, 175, 178, 193, 292

I

  • ICF (Informed Consent Form), 72–73
  • ICH (International Conference on Harmonization), 72
  • icons explained, 3
  • identification (ID) numbers, 104
  • identity line, 245
  • imputation, 75
  • incidence, 191–198
  • incidence rate, 192–198
  • inclusion criteria, 64
  • independent t test, 148
  • independent variable, 208–210, 213, 215, 291–294
  • indicator variables, 237–238
  • indices, 174
  • individual‐level data, 160
  • inferential statistics, 34, 77
  • inferring, 10
  • infinity, 33
  • influenza, 192
  • Informed Consent Form (ICF), 72–73
  • inner mean, 118
  • integers, 107
  • interaction, 296–297
  • interaction terms, 237, 329
  • intercept, 215, 221, 224–225, 272, 329
  • intercept row, 224
  • interim analysis, 76
  • International Conference on Harmonization (ICH), 72
  • International Review Board (IRB), 72
  • Internet sources. See also G*Power; Microsoft Excel
    • ClinCalc, 14
    • Comprehensive R Archive Network (CRAN), 58
    • Epi Info, 59
    • GraphPad, 67
    • International Business Machines, 57
    • National Health and Nutrition Examination Survey (NHANES), 82, 86, 93, 148–157
    • National Institutes of Health (NIH), 72–73
    • OpenStat and LazStats, 59
    • Power and Sample Size Calculation (PS), 60, 171–172, 324–325
    • SAS OnDemand for Academics (ODA), 55–56
    • Statista, 34
    • StatPages, 158, 190, 282
  • interpolation, 208
  • inter‐quartile range (IQR), 120
  • inter‐rater reliability, 188–190
  • interval data, 102
  • intervention‐related measurements, 63
  • interventions, 61–62
  • intra‐rater reliability, 188–190
  • IQR (inter‐quartile range), 120
  • IRB (International Review Board), 72
  • iterative models, 246–247

K

  • Kaplan‐Meier method, 313–316, 328, 330
  • Kaplan‐Meier (K‐M) survival estimate, 313, 317
  • kappa (κ), 189–190
  • kilograms (kg), 218–220, 225–226, 239
  • Kruskal, William (statistician), 141
  • Kruskal‐Wallis test, 47–49, 144, 157
  • kurtosis, 121–122

L

  • Last Observation Carried Forward (LOCF), 75
  • last‐seen date, 303
  • LazStats, 59
  • least‐squares line, 234
  • left skewed data, 121
  • leptokurtic distribution, 122
  • lethal dose, 262
  • levels of measurement, 102–103
  • LFU (lost to follow‐up), 303–304, 308–309
  • life sciences, 1
  • life‐table method, 307, 311–316, 320–321
  • likert scale, 102, 106
  • line of best fit, 215–216
  • linear combination, 329
  • linear function, 210–211
  • linear model, 272–273
  • linear regression, 210
  • link function, 273–274
  • liver cancer, 93–97
  • locally weighted scatterplot smoothing (LOWESS) curve‐fitting, 12, 286–290
  • LOCF (Last Observation Carried Forward), 75
  • logarithms, 21–22, 118–119
  • logistic regression
    • basics of, 251–254
    • definition of, 12, 210
    • disadvantages of, 266–268
    • evaluating, 257–265
    • sample size for, 268–269
    • using, 249–250, 255–257
  • log‐normal distribution, 36, 125, 353–354
  • log‐rank test, 317–325, 328–330
  • longitudinal research, 90
  • lost to follow‐up (LFU), 303–304, 308–309
  • LOWESS (locally weighted scatterplot smoothing) curve‐fitting, 12, 286–290
  • lung cancer, 296–298

M

  • Mann, Henry (professor), 141
  • Mann‐Whitney U test, 47–49, 143, 362
  • Mantel‐Cox test. See log‐rank test
  • Mantel‐Haenszel chi‐square test, 168
  • margin of error (ME), 134
  • marginal totals, 160
  • masking, 66, 70, 171
  • mathematical expressions, 15
  • mathematical operations, 18–25
  • matrix, 26
  • matrix algebra, 26
  • maximum value, 222
  • ME (margin of error), 134
  • mean
    • arithmetic, 115–116
    • compared to other values, 142–157, 362
    • confidence limits, 134–135
  • mean square (mean Sq), 155
  • measurements, 63–64, 102–103
  • mechanical function, 279
  • median, 116–117, 123, 222
  • meta‐analyses, 97–98
  • metadata, 110
  • mice, 1, 318
  • micrograms (μg), 280–282
  • Microsoft Excel
    • for data collection, 103, 105, 107–110
    • functions of, 57
    • for log‐rank tests, 319–320
    • for randomization, 67
    • for straight‐line regression, 217
    • for survival regressions, 343
  • millimeters of mercury (mmHg), 218–226, 229, 239
  • minimum value, 222
  • missing data, 74–75
  • Mitra, Amal K. (author)
    • Epidemiology for Dummies, 92
  • mmHg (millimeters of mercury), 218–226, 229, 239
  • mode, 117
  • model building, 246
  • model fit statistics, 242
  • models
    • generalized linear (GLM), 272–278
    • linear, 272–273
    • null, 228, 242, 259
    • parsimonious, 293
    • predictive, 228–229
    • regression, 68, 208–209
  • molecular biology, 7
  • multicollinearity, 246–247
  • multi‐dimensional arrays, 26–27
  • multilevel variable, 236
  • multiple regression
    • basics of, 234–235
    • introduction to, 26
    • sample size for, 247–248
    • special considerations, 245–247
    • using, 236–245
  • multiple R‐squared, 242
  • multiplication, 18–20
  • multiplicative, 296
  • multiplicity, 75–76
  • multi‐site study, 104
  • multi‐stage sampling, 85–86
  • multivariable regression, 291
  • multivariate analysis, 128, 145, 291

N

  • Nagelkerke R‐square, 259
  • National Health and Nutrition Examination Survey (NHANES), 82, 86, 93, 148–157
  • National Institutes of Health (NIH), 72–73
  • natural logarithms, 22
  • negative predicted value (NPV), 187
  • negatively skewed data, 121
  • NHANES (National Health and Nutrition Examination Survey), 82, 86, 93, 148–157
  • NIH (National Institutes of Health), 72–73
  • nominal variables, 102
  • non‐code‐based methods, 60
  • nonlinear function, 211
  • nonlinear least‐squares regression, 12
  • nonlinear regression, 279–286
  • nonlinear trends, 277
  • nonparametric regression, 286–290
  • nonparametric tests, 48–49, 157
  • non‐proportional hazards, 323
  • non‐sampling error, 78
  • non‐steroidal anti‐inflammatory drugs (NSAIDS), 159–166
  • normal distribution, 13, 36, 114, 353
  • normal Q‐Q graph, 222–223
  • normal‐based confidence intervals, 134
  • normal‐based confidence limits, 134
  • normality assumption, 143
  • not rule, 31
  • notches, 128
  • NPV (negative predicted value), 187
  • NSAIDs (non‐steroidal anti‐inflammatory drugs), 159–166
  • nuisance variables, 145
  • null hypothesis
  • null model, 228, 242, 259
  • numerator, 41–42
  • numerical data, 107–109, 114–123

O

  • obesity, 177–178, 182–183
  • observational research, 88–90
  • observed count, 164
  • observed versus predicted graph, 245
  • ODA (SAS OnDemand for Academics), 55–56
  • odds, 32–33, 181
  • odds ratio (OR), 94–96, 181–183, 266–267
  • Office for Human Research Protections (OHRP), 72
  • one‐dimensional arrays, 25
  • one‐group t test, 148
  • one‐sided confidence interval, 133
  • one‐way ANOVA, 144
  • open‐source software, 58–59
  • OpenStat, 59
  • OR (odds ratio), 94–96, 181–183, 266–267
  • or rule, 31
  • order of operation, 24
  • ordinal data, 102
  • ordinary multiple linear regression model. See multiple regression
  • ordinary regression, 210
  • outcome, 208
  • outcome‐related measurements, 64
  • outliers, 229
  • overall accuracy, 185

P

  • p value
  • paired t test, 148
  • paired values, 363
  • parabolic relationship, 214, 252–253
  • parallel design, 65
  • parameters, 33–34, 77, 208, 279
  • parametric tests, 48–49
  • parsimonious models, 293
  • parsimony, 293
  • participant identification (ID), 240
  • participant study identifier, 104
  • participants. See also sample size; samples
    • enrolling, 68
    • protection for, 71–73
    • selecting, 64–65, 236–237
  • PatSat, 106
  • PCR (polymerase chain reaction), 184
  • Pearson, Karl (biostatistician), 161
  • Pearson Correlation test, 47–49, 227
  • Pearson kurtosis index, 122
  • percentile, 120
  • perfect predictor problem, 267–268
  • perfect separation, 267–268
  • periodicity, 83
  • PH (proportional hazards regression), 330–331, 333–334
  • pharmacokinetic (PK) properties, 280–282
  • pi (Π), 27–28
  • pie charts, 113
  • pilot study, 230
  • placebo, 66–67, 171, 187–188
  • placebo effect, 66, 187–188
  • plain text format, 16, 24
  • platykurtic distribution, 122
  • Plummer, Walton D. (biostatistician), 60
  • pointy‐topped distribution, 114
  • poisson distribution, 36, 355
  • poisson regression
    • definition of, 12, 210
    • using, 271–278
  • polymerase chain reaction (PCR), 184
  • populations, 33–37, 175
  • positive predictive value (PPV), 187
  • positively skewed data, 121
  • post‐hoc tests, 143, 152–157
  • potential confounding variables, 64
  • Power and Sample Size Calculation (PS)
    • for chi‐square and Fisher exact tests, 171–172
    • definition of, 60
    • for survival comparisons, 324–325
  • power calculations, 47, 171–172, 361, 365
  • powers, 20–21, 41, 44, 45–47, 206
  • PPV (positive predictive value), 187
  • precision, 37, 38, 147
  • predicted values, 242
  • predictive model, 228–229
  • predictive value negative, 187
  • predictive value positive, 187
  • predictors
    • introduction to, 208, 233
    • in iterative models, 246–247
    • in logistic models, 255–256
    • in regression models, 273–274, 279
    • relation to the outcome, 242, 245–246, 250
    • types of, 209, 235–236
  • pregnancy, 171, 185–187
  • prevalence, 179, 186, 191–194
  • prevalence ratio, 179
  • primary diagnosis (PrimaryDx), 236–238
  • primary efficacy objective, 62
  • primary objectives, 62
  • primary sampling units (PSU), 86
  • privacy, 71
  • probability, 30–33
  • probability bell curve, 353
  • probability distributions, 35–37
  • probability of independence, 166
  • procedural descriptions, 70
  • product, of an array, 27
  • prognosis curves, 329, 331, 343–346
  • proportional hazards (PH) regression, 330–331, 333–334
  • proportions, 11, 135–136, 363
  • protective factor, 178
  • protractor, 113
  • PS (Power and Sample Size Calculation)
    • for chi‐square and Fisher exact tests, 171–172
    • definition of, 60
    • for survival comparisons, 324–325
  • pseudo‐r‐squared values, 259
  • PSU (primary sampling units), 86
  • Python, 58

R

  • R (software)
    • description of, 58
    • nonlinear regression, 282–286
    • odds ratio calculation, 183
    • risk ratio calculation, 180–181
    • straight‐line regression, 221
  • r value, 203–206
  • radiation exposure, 251–256, 260–263, 267, 337–338
  • radical sign (√), 21
  • random number generator (RNG), 80–81
  • random shuffling, 67
  • random variability, 158
  • randomization, 97, 171
  • randomized controlled trials (RCTs), 65–67, 98
  • randomness, 33
  • range, 120
  • ranks, 49
  • rate ratio (RR), 195–196
  • ratio data, 102, 107–108
  • rationale, 69
  • RCTs (randomized controlled trials), 65–67, 98
  • Receiver Operator Characteristics (ROC), 257, 264–265
  • reference level, 237
  • regression
    • logistic
      • basics of, 251–254
      • definition of, 12, 210
      • disadvantages of, 266–268
      • evaluating, 257–265
      • sample size for, 268–269
      • using, 249–250, 255–257
    • multiple
      • basics of, 234–235
      • introduction to, 26
      • sample size for, 247–248
      • special considerations, 245–247
      • using, 236–245
    • multivariable, 291
    • ordinary, 210
    • straight‐line
      • basics of, 215–216
      • disadvantages of, 229–231
      • evaluating, 220–224
      • using, 216–220
      • when to use, 213–215
    • survival
      • concepts of, 329–335
      • definition of, 210
      • evaluating, 337–343
      • sample size for, 346–347
      • using, 335–336
      • when to use, 328–329, 343
    • univariate, 209
  • regression analysis, 12, 207–208
  • regression models, 68, 208–209
  • relative frequency, 30
  • relative risk, 95, 178–181
  • REM (Roentgen Equivalent Man), 251–254, 260–262, 267
  • research
    • analytic, descriptive and observational, 88–90
    • animal, 1
    • epidemiological, 1, 9–12
    • experimental, 61, 88–90, 97–98
    • human health, 88
    • longitudinal, 90
  • research studies, 1
  • residual information, 242
  • residual standard error, 222, 242
  • residuals, 222–224, 242–245
  • residuals versus fitted graph, 222–223
  • retinopathy, 292–293
  • right skewed data, 121
  • risk ratio, 96, 178–181
  • RMS (root‐mean‐square), 222
  • RNG (random number generator), 80–81
  • ROC (Receiver Operator Characteristics), 257, 264–265
  • Roentgen Equivalent Man (REM), 251–254, 260–262, 267
  • Roman letters, 17
  • root‐mean‐square (RMS), 222
  • roots, 21
  • Rothman’s causal pie, 297–298
  • round‐off error, 352
  • RR (rate ratio), 195–196
  • RStudio, 58
  • Rumsey, Deborah J. (author)
    • Statistics For Dummies, 2, 29
    • Statistics II for Dummies, 29

S

  • safety considerations, 70
  • safety objectives, 62–63
  • safety study, 62
  • sample size
    • for chi‐square and Fisher exact tests, 171–172
    • for cohort studies, 95–96
    • compared to power and effect size, 44–47
    • for comparing averages, 158
    • for correlation tests, 206–207
    • estimating, 198, 361–367
    • introduction to, 14
    • for logistic regression, 268–269
    • for multiple regression, 247–248
    • relation to confidence intervals, 130–131
    • for straight‐line regression, 230–231
    • for survival comparisons, 324–325
    • for survival regression, 346–347
  • sample statistic, 175
  • samples. See also sample size
    • framing, 78–79
    • introduction to, 9–10, 14
    • selecting, 33–37, 68, 78–80
    • types of, 80–86
  • sampling clusters, 83–84
  • sampling distribution, 38
  • sampling error, 34, 78
  • sampling frame, 78
  • sampling strategies, 175–176
  • SAP (Statistical Analysis Plan), 70
  • SAS (Statistical Analysis System), 54–56, 58, 110
  • SAS OnDemand for Academics (ODA), 55–56
  • SBP (systolic blood pressure)
    • comparing, 39
    • effect of drugs on, 62, 123–125
    • relation between weight and, 218–229
    • variable name of, 18
  • scatter plots
  • Scheffe’s test, 154–156
  • scientific notation, 28
  • screening tests, 184–187
  • SD (standard deviation), 119, 123, 143, 222
  • SE (standard error)
    • calculating, 147–148, 179–180
    • of coefficients, 225–226, 340
    • compared to confidence intervals, 130–131
    • description of, 38, 129, 163, 242
    • in a fraction, 41
  • secondary efficacy objective, 62
  • secondary objectives, 62
  • sensitivity, 185–186, 262–264
  • sigma (Σ), 27–28
  • significance, 40, 42–43, 174
  • significance tests. See statistical tests; specific test names
  • significant association, 11
  • significant correlation, 363–364
  • simple formulas, 24
  • simple random samples (SRS), 80–81
  • simple randomization, 66
  • simple regression, 209
  • simulation, 79
  • single‐blinding, 66
  • single‐precision numbers, 107
  • single‐site study, 104
  • skewed data, 11, 114, 121, 353
  • skewness, 121
  • skewness coefficient (γ), 121
  • slope row, 224
  • slopes, 215–217, 221, 224–231
  • smoking, 296–298, 334–335
  • smoothing fraction, 289–290
  • Social Science Statistics, 170
  • software
    • for data collection, 105–110, 241
    • evolution of, 54
    • introduction to, 8
    • for logistic regression, 256–257
    • for power calculations, 47
    • for straight‐line regression, 217
    • types of, 54–60
    • variables in, 18
  • Spearman Rank Correlation test, 47–49
  • specificity, 185–186, 262–264
  • spot‐checking, 109
  • SPSS (Statistical Package for the Social Sciences), 54–58
  • SQL (Structured Query Language), 110
  • square root, 332
  • square root law, 131
  • squaring, 332
  • SRS (simple random samples), 80–81
  • s‐shaped relationship, 214, 252–256
  • SSQ (sum of squares), 155, 215–216, 234
  • standard deviation (SD), 119, 123, 143, 222
  • standard error (SE)
    • calculating, 147–148, 179–180
    • of coefficients, 225–226, 340
    • compared to confidence intervals, 130–131
    • description of, 38, 129, 163, 242
    • in a fraction, 41
  • statistic, definition of, 40
  • Statistical Analysis Plan (SAP), 70
  • Statistical Analysis System (SAS), 54–56, 58, 110
  • statistical distributions, 13
  • statistical estimation theory, 10, 37–39
  • statistical inference, 10, 37
  • Statistical Package for the Social Science (SPSS), 54–58
  • statistical tests, 8, 40–41, 44, 47–49. See also tests
  • statistically rare, 93
  • Statistics For Dummies (Rumsey), 2, 29
  • Statistics II For Dummies (Rumsey), 29
  • StatPages, 158, 190, 282
  • stepped line charts, 312–313
  • stepwise selection, 195
  • storage modes, 107
  • straight‐line regression
    • basics of, 215–216
    • disadvantages of, 229–231
    • evaluating, 220–224
    • using, 216–220
    • when to use, 213–215
  • stratified samples, 81–82
  • strong linear relationship, 214
  • Structured Query Language (SQL), 110
  • student t distribution, 357–358
  • student t test, 41–42, 47–49, 142–152, 362–363
  • student t value, 226
  • study design, 2, 7, 14, 88–90
  • study protocol, 68–71
  • study rationale, 69
  • study title, 69
  • subtraction, 18–19
  • sum, 27
  • sum of squares (SSQ), 155, 215–216, 234
  • summarizing data
    • categorical, 112–114
    • numerical, 114–123
    • survival, 302–316
  • summary statistics, 111
  • surveillance, 89–90
  • survival analysis, 13, 68, 301
  • survival curve shapes, 329–330
  • survival data, 208, 301–306, 337, 343
  • survival rate, 305, 364
  • survival regression
    • concepts of, 329–335
    • definition of, 210
    • evaluating, 337–343
    • sample size for, 346–347
    • using, 335–336
    • when to use, 328–329, 343
  • survival time, 301–306, 317–325
  • symbolic constants, 17
  • symmetry, 115
  • synergy, 245–247
  • systematic error, 37
  • systematic sampling, 82–83
  • systemic reviews, 97–98
  • systolic blood pressure (SBP)
    • comparing, 39
    • effect of drugs on, 62, 123–125
    • relation between weight and, 218–229
    • variable name of, 18

T

  • t tests, 41–42, 47–49, 142–152, 362–363
  • t value, 242
  • Tableau, 57, 60
  • terminal elimination rate constant, 12
  • test statistic, 37, 40, 41–42, 147
  • tests
    • chi‐square, 11, 13, 161–169, 171–172, 174
    • Fisher exact, 11, 13, 169–172
    • H‐L, 259
    • log‐rank, 317–325, 328–330
    • Mann‐Whitney U, 47–49, 143, 362
    • nonparametric, 48–49, 157
    • post‐hoc, 143, 152–157
    • Scheffe’s, 154–156
    • Spearman Rank Correlation, 47–49
    • student t, 41–42, 47–49, 142–152, 362–363
    • Tukey‐Kramer, 154–156
    • unequal‐variance t, 143
    • Wilcoxon Signed‐Ranks (WSR), 47–49, 142, 146
    • Wilcoxon Sum‐of‐Ranks, 47–49, 143, 157, 362
  • theoretical function, 279
  • third quartile, 222
  • three‐way ANOVA, 144–145
  • tied values, 49
  • time data, 108–110
  • time‐to‐event variable, 337
  • treatment bias, 66
  • treatment periods, 65
  • treatments, 187–188
  • trees, 1
  • trend line, 276
  • trimmed mean. See inner mean
  • true value, 37
  • Tukey‐Kramer test, 154–156
  • Tukey’s HSD (“honestly” significant difference test), 154–156
  • two‐dimensional arrays, 26
  • two‐peaked (bimodal) distribution, 114, 117
  • type I error, 41, 42–44, 75–76, 152
  • Type II diabetes, 10–12, 192–197, 250, 292
  • type II error, 41–44
  • typeset format, 16, 24–25
  • typographic effects, 16

U

  • ultrasound, 188
  • unbalanced groups, 154
  • under‐coverage, 78
  • unequal‐variance t test, 143
  • uniform distribution, 352
  • United States
    • airports, 34–35
    • census, 84
    • International Review Board, 72
    • surveillance study, 86
  • univariate analysis, 128
  • univariate regression, 209
  • Universität Düsseldorf, 59
  • unskewed data, 121

V

  • value of F statistic (F value), 155
  • values. See also p value
    • absolute, 23
    • average, 11, 141–158
    • calculated, 37
    • estimate, predicted and t values, 242
    • F value, 155
    • h value, 345–346
    • paired, 363
    • positive predictive, 187
    • pseudo‐r‐squared, 259
    • r value, 203–206
    • tied, 49
  • Vanderbilt University, 60
  • Vanderbilt University Medical Center, 324–325
  • variable names, 110
  • variable width, 128
  • variables
  • variance, 119, 143, 155
  • variance table, 155
  • viruses, 1
  • Viya, 56, 60
  • volume of distribution (Vd), 280–282

W

  • Wallis, Wilson Allen (statistician), 141
  • washout intervals, 65
  • waves, 96, 176
  • weak linear relationship, 214
  • websites. See also G*Power; Microsoft Excel
    • ClinCalc, 172
    • Cochrane, 297
    • Comprehensive R Archive Network (CRAN), 58
    • Epi Info, 59
    • Graphpad, 67
    • International Business Machines, 57
    • National Health and Nutrition Examination Survey (NHANES), 82, 86, 93, 148–157
    • National Institutes of Health (NIH), 72–73
    • OpenStat and LazStats, 59
    • Power and Sample Size Calculation (PS), 60, 171–172, 324–325
    • SAS OnDemand for Academics (ODA), 55–56, 57
    • Social Science Statistics, 170
    • Statista, 34
    • StatPages, 158, 190, 282
  • weibull distribution, 330, 356–357
  • weight, 218–220, 225–226, 239–240, 246
  • Welch, Bernard Lewis (statistician), 141
  • Welch test, 143, 150–151
  • Whitney, Donald Ransom (statistician), 141
  • whole numbers, 107
  • Wilcoxon, Frank (statistician), 141
  • Wilcoxon Signed‐Ranks (WSR) test, 47–49, 142, 146
  • Wilcoxon Sum‐of‐Ranks test, 47–49, 143, 157, 362
  • withdrawal criteria, 64
  • World War II, 71, 264

X

  • X variable, 213–215, 225–228
  • x‐rays, 188

Y

  • Y variable, 213–216, 224–228
  • Yates, Frank (statistician), 168–169
  • Yates continuity correction, 168–169